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Abstract.  The  theory  of  graph  games  with  w-regular  winning  condi¬ 
tions  is  the  foundation  for  modeling  and  synthesizing  reactive  processes. 
In  the  case  of  stochastic  reactive  processes,  the  corresponding  stochastic 
graph  games  have  three  players,  two  of  them  (System  and  Environment) 
behaving  adversarially,  and  the  third  (Uncertainty)  behaving  probabilis¬ 
tically.  We  consider  two  problems  for  stochastic  graph  games:  the  qualita¬ 
tive  problem  asks  for  the  set  of  states  from  which  a  player  can  win  with 
probability  1  ( almost-sure  winning );  the  quantitative  problem  asks  for 
the  maximal  probability  of  winning  ( optimal  winning )  from  each  state. 
We  show  that  for  Rabin  winning  conditions,  both  problems  are  in  NP.  As 
these  problems  were  known  to  be  NP-hard,  it  follows  that  they  are  NP- 
complete  for  Rabin  conditions,  and  dually,  coNP-complete  for  Streett 
conditions.  The  proof  proceeds  by  showing  that  pure  memoryless  strate¬ 
gies  suffice  for  qualitatively  and  quantitatively  winning  stochastic  graph 
games  with  Rabin  conditions.  This  insight  is  of  interest  in  its  own  right,  as 
it  implies  that  controllers  for  Rabin  objectives  have  simple  implementa¬ 
tions.  We  also  prove  that  for  every  w-regular  condition,  optimal  winning 
strategies  are  no  more  complex  than  almost-sure  winning  strategies. 


1  Introduction 

A  stochastic  graph  game  [5]  is  played  on  a  directed  graph  with  three  kinds  of 
states:  player-1,  player-2,  and  probabilistic  states.  At  player-1  states,  player  1 
chooses  a  successor  state;  at  player-2  states,  player  2  chooses  a  successor  state; 
and  at  probabilistic  states,  a  successor  state  is  chosen  according  to  a  given  prob¬ 
ability  distribution.  The  result  of  playing  the  game  forever  is  an  infinite  path 
through  the  graph.  If  there  are  no  probabilistic  states,  we  refer  to  the  game  as 
a  2-player  graph  game ;  otherwise,  as  a  2 1/2-player  graph  game.  There  has  been 
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a  long  history  of  using  2-player  graph  games  for  modeling  and  synthesizing  re¬ 
active  processes  [1, 14, 16]:  a  reactive  system  and  its  environment  represent  the 
two  players,  whose  states  and  transitions  are  specified  by  the  states  and  edges 
of  a  game  graph.  Consequently,  2  i/^-player  graph  games  provide  the  theoretical 
foundation  for  modeling  and  synthesizing  processes  that  are  both  reactive  and 
stochastic  [9, 15]. 

For  the  modeling  and  synthesis  (or  “control”)  of  reactive  processes,  one  tra¬ 
ditionally  considers  a>-regular  winning  conditions,  which  naturally  express  the 
temporal  specifications  and  fairness  assumptions  of  transition  systems  [11].  This 
paper  focuses  on  the  complexity  of  solving  2  i/^-player  graph  games  with  respect 
to  two  important  normal  forms  of  cc-regular  winning  conditions:  Rabin  condi¬ 
tions  and  Streett  conditions  [17].  Rabin  and  Streett  conditions  are  dual  (i.e. , 
complementary),  and  their  practical  relevance  stems  from  the  fact  that  their 
form  corresponds  to  the  form  of  fairness  conditions  for  transition  systems. 

In  the  case  of  2-player  graph  games,  where  no  randomization  is  involved,  a 
fundamental  determinacy  result  ensures  that,  given  an  w-regular  winning  con¬ 
dition,  at  each  state,  either  player  1  has  a  strategy  to  ensure  that  the  condition 
holds,  or  player  2  has  a  strategy  to  ensure  that  the  condition  does  not  hold  [10]. 
Thus,  the  problem  of  solving  2-player  graph  games  consists  in  finding  the  set  of 
winning  states ,  from  which  player  1  can  ensure  that  the  condition  holds.  This 
problem  is  known  to  be  in  NP  (~l  coNP  for  parity  conditions,  to  be  NP-complete 
for  Rabin  conditions  [8] ,  and  consequently,  to  be  coNP-complete  for  Streett  con¬ 
ditions.  The  proofs  of  inclusion  in  NP  rely  on  the  existence  of  pure  (i.e.,  deter¬ 
ministic)  memoryless  winning  strategies,  which  act  as  polynomial  witnesses.  The 
existence  of  pure  memory  less  winning  strategies  is  also  of  independent  interest, 
as  such  strategies  can  be  simply  and  effectively  implemented  by  a  controller. 
Note  that  for  Streett  conditions,  winning  strategies  in  general  require  memory. 

In  the  case  of  2  y2-player  graph  games,  where  randomization  is  present  in  the 
transition  structure,  the  notion  of  winning  needs  to  be  clarified.  Player  1  is  said  to 
win  surely  if  she  has  a  strategy  that  guarantees  to  achieve  the  winning  condition 
against  all  player-2  strategies.  While  this  is  the  classical  notion  of  winning  in 
the  2-player  case,  it  is  less  meaningful  in  the  presence  of  probabilistic  states, 
because  it  makes  all  probabilistic  choices  adversarial  (it  treats  them  analogously 
to  player-2  choices).  To  adequately  treat  probabilistic  choice,  we  consider  the 
probability  with  which  player  1  can  ensure  that  the  winning  condition  is  met. 
We  thus  define  two  solution  problems  for  2  Y2-player  graph  games:  the  qualitative 
problem  asks  for  the  set  of  states  from  which  player  1  can  ensure  winning  with 
probability  1;  the  quantitative  problem  asks  for  the  maximal  probability  with 
which  player  1  can  ensure  winning  from  each  state  (this  probability  is  called  the 
value  of  the  game  at  a  state)  [7].  Correspondingly,  we  define  almost-sure  winning 
strategies,  which  enable  player  1  to  win  with  probability  1  whenever  possible, 
and  optimal  strategies,  which  enable  player  1  to  win  with  maximal  probability. 
The  main  result  of  this  paper  is  that,  in  21/2-player  graph  games,  both  the 
qualitative  and  the  quantitative  solution  problems  are  NP-complete  in  the  case  of 
Rabin  conditions,  and  coNP-complete  in  the  case  of  Streett  conditions.  The  NP- 
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hardness  for  Rabin  conditions  follows  from  the  NP-hardness  of  2-player  games 
with  Rabin  conditions  [8];  we  establish  the  membership  in  NP.  Both  questions 
are  known  to  be  in  NP  n  coNP  for  the  more  restrictive,  self-dual  case  of  parity 
conditions  [4, 13, 18],  whose  exact  complexity  is  an  important  open  problem. 

Our  proof  of  membership  in  NP  for  stochastic  Rabin  games  relies  on  es¬ 
tablishing  the  existence  of  pure  memoryless  almost-sure  winning  and  optimal 
strategies.  The  corresponding  result  for  stochastic  parity  games  has  been  proved 
only  recently  [4, 13, 18],  and  these  proofs  rely  on  the  self-duality  of  parity  condi¬ 
tions.  For  Rabin  conditions,  a  new  proof  approach  is  required.  First,  we  show  the 
existence  of  pure  memoryless  almost-sure  winning  strategies  in  stochastic  Rabin 
games  by  a  reduction  from  2  ^-player  games  to  2-player.  The  reduction  pre¬ 
serves  the  ability  of  player  1  to  win  with  probability  1,  but  it  does  not  preserve 
the  maximal  probability  of  winning.  The  proof  technique  is  combinatorial  and 
uses  graph-theoretic  arguments  to  account  for  the  fact  that  Rabin  conditions 
are  not  closed  under  complementation.  Second,  to  show  the  existence  of  pure 
memoryless  optimal  strategies  in  stochastic  Rabin  games,  we  partition  the  game 
graph  into  value  classes,  each  consisting  of  states  where  the  value  of  the  game  is 
identical.  We  prove  that  if  the  players  play  according  to  optimal  strategies,  then 
the  game  leaves  every  intermediate  value  class  (in  which  the  value  is  neither  0 
nor  1)  with  probability  1.  We  then  use  the  qualitative  result  on  almost-sure 
winning  to  establish  the  existence  of  pure  memoryless  optimal  strategies. 

We  emphasize  that,  as  mentioned  earlier,  the  existence  of  pure  memoryless 
strategies  is  relevant  in  its  own  right,  as  such  strategies  consist  in  mappings 
that  associate  with  each  player-1  state  a  unique  successor,  without  need  for 
randomization  or  memory;  such  mappings  are  easily  implemented  in  controllers. 
Furthermore,  our  techniques  lead  us  to  a  more  general  result,  which  states  a 
strong  connection  between  certain  qualitative  and  quantitative  games:  we  show 
that  for  every  w-regular  winning  condition  in  a  2  ^-player  game  graph,  if  a 
restricted  family  of  strategies  suffices  for  almost-sure  winning,  then  it  suffices 
also  for  optimality.  Hence  future  research  on  2  i/Yplayer  games  with  ^-regular 
conditions  can  focus  on  qualitatively  (i.e. ,  almost-sure)  winning  strategies,  and 
our  result  generalizes  these  strategies  to  quantitatively  winning  (i.e.,  optimal) 
strategies. 

2  Definitions 

We  consider  several  classes  of  turn-based  games,  namely,  two-player  turn-based 
probabilistic  games  (2Y2-player  games),  two-player  turn-based  deterministic 
games  (2-player  games),  and  Markov  decision  processes  (D/Yplayer  games). 
Game  graphs.  A  turn-based  probabilistic  game  graph  ( 21/2-player  game  graph) 
G  =  ((S,  E),  (Si,S2,  Sq),  S)  consists  of  a  directed  graph  (S,  E),  a  partition  (Si, 
S2 ,  Sq)  of  the  finite  set  S  of  states,  and  a  probabilistic  transition  function  5: 
Sq  — >  V(S ),  where  V(S)  denotes  the  set  of  probability  distributions  over  the 
state  space  S.  The  states  in  Si  are  the  player-1  states,  where  player  1  decides  the 
successor  state;  the  states  in  S 2  are  the  player-2  states,  where  player  2  decides 
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the  successor  state;  and  the  states  in  Sq  are  the  probabilistic  states,  where  the 
successor  state  is  chosen  according  to  the  probabilistic  transition  function  5.  We 
assume  that  for  s  £  Sq  and  t  £  S,  we  have  (s,t)  £  E  iff  S(s)(t)  >  0,  and  we 
often  write  5(s,t)  for  S(s)(t).  For  technical  convenience  we  assume  that  every 
state  in  the  graph  (S,  E)  has  at  least  one  outgoing  edge.  For  a  state  s  £  S,  we 
write  E(s)  to  denote  the  set  {  t  £  S  |  (s,  t)  £  E  }  of  possible  successors. 

A  set  U  C  S  of  states  is  called  S -closed  if  for  every  probabilistic  state 
u  £  U  fl  Sq,  if  ( u,t )  £  E,  then  t  £  U.  The  set  U  is  called  S-live  if  for  ev¬ 
ery  nonprobabilistic  state  s  £  U  fl  (Si  U  S2),  there  is  a  state  t  £  U  such  that 
( s,t )  £  E.  A  5-closed  and  <5-live  subset  U  of  S  induces  a  subgame  graph  of  G, 
indicated  by  G  f  U. 

The  turn-based  deterministic  game  graphs  ( 2-player  game  graphs )  are  the 
special  case  of  the  2y2-player  game  graphs  with  Sq  =  0.  The  Markov  decision 
processes  (1 1/2 -player  game  graphs)  are  the  special  case  of  the  2  i/^-player  game 
graphs  with  Si  =  0  or  S2  =  0-  We  refer  to  the  MDPs  with  S2  =  0  as  player-1 
MDPs,  and  to  the  MDPs  with  Si  =  0  as  player-2  MDPs. 

Plays  and  strategies.  An  infinite  path,  or  play,  of  the  game  graph  G  is  an 
infinite  sequence  u>  =  (sq,  Si,  s2, . . .)  of  states  such  that  ( Sk,Sk+i )  £  E  for  all 
k  £  N.  We  write  fl  for  the  set  of  all  plays,  and  for  a  state  s  £  S,  we  write 
fls  C  fl  for  the  set  of  plays  that  start  from  the  state  s. 

A  strategy  for  player  1  is  a  function  a:  S*  ■  S 1  — ►  V(S)  that  assigns  a  prob¬ 
ability  distribution  to  all  finite  sequences  w  £  S*  ■  Si  of  states  ending  in  a 
player-1  state  (the  sequence  represents  a  prefix  of  a  play).  Player  1  follows  the 
strategy  a  if  in  each  player-1  move,  given  that  the  current  history  of  the  game  is 
w  £  S*  ■  Si,  she  chooses  the  next  state  according  to  the  probability  distribution 
<t(w).  A  strategy  must  prescribe  only  available  moves,  i.e.,  for  all  w  £  S*,  s  £  Si, 
and  t  £  S,  if  a(w  •  s)(t)  >  0,  then  (s,t)  £  E.  The  strategies  for  player  2  are 
defined  analogously.  We  denote  by  E  and  17  the  set  of  all  strategies  for  player  1 
and  player  2,  respectively. 

Once  a  starting  state  s  £  S  and  strategies  a  £  E  and  n  £  77  for  the  two 
players  are  fixed,  the  outcome  of  the  game  is  a  random  walk  o>J,7r  for  which 
the  probabilities  of  events  are  uniquely  defined,  where  an  event  A  C  fl  is  a 
measurable  set  of  paths.  Given  strategies  a  for  player  1  and  n  for  player  2, 
a  play  w  =  (so,  Si,  s2, . . .)  is  feasible  if  for  every  k  £  N  the  following  three 
conditions  hold:  (1)  if  Sk  £  Sq,  then  (sfc,Sfc+i)  £  E;  (2)  if  Sk  £  Si,  then 
<r(so,  si,.,,,  Sk)(sk+ 1)  >  0;  and  (3)  if  Sk  £  S2  then  7r(s0,  si,  . . . ,  Sfc)(sfc+i)  >  0. 
Given  two  strategies  o  £  E  and  n  £  II,  and  a  state  s  £  S,  we  denote  by 
Outcome(s,  a,  it)  C  fls  the  set  of  feasible  plays  that  start  from  s  given  strategies 
a  and  7 r.  For  a  state  s  £  S  and  an  event  A  C  fl,  we  write  Pi'g’,I’(A)  for  the 
probability  that  a  path  belongs  to  A  if  the  game  starts  from  the  state  s  and 
the  players  follow  the  strategies  er  and  n,  respectively.  In  the  context  of  player-1 
MDPs  we  often  omit  the  argument  7r,  because  77  is  a  singleton  set. 

We  classify  strategies  according  to  their  use  of  randomization  and  memory. 
The  strategies  that  do  not  use  randomization  are  called  pure.  A  player-1  strat¬ 
egy  a  is  pure  if  for  all  w  £  S*  and  s  £  Si,  there  is  a  state  t  £  S  such  that 
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a(w  ■  s)(t)  =  1.  We  denote  by  Sp  C  S  the  set  of  pure  strategies  for  player  1.  A 
strategy  that  is  not  necessarily  pure  is  called  randomized.  Let  M  be  a  set  called 
memory.  A  player-1  strategy  can  be  described  as  a  pair  of  functions:  a  memory- 
update  function  <ru:  SxH-*M  and  a  next-move  function  am:  Si  x  M  — »  T>(S). 
The  strategy  (cru,am)  is  finite-memory  if  the  memory  M  is  finite.  We  denote  by 
SF  the  set  of  finite-memory  strategies  for  player  1,  and  by  SPF  the  set  of  pure 
finite-memory  strategies;  that  is,  EPF  =  Sp  fl  SF .  The  strategy  ( au,am )  is 
memoryless  if  |Mj  =  1;  that  is,  the  next  move  does  not  depend  on  the  history  of 
the  play  but  only  on  the  current  state.  A  memoryless  player- 1  strategy  can  be 
represented  as  a  function  a:  Si  — ►  T>(S).  A  pure  memoryless  strategy  is  a  pure 
strategy  that  is  memoryless.  A  pure  memoryless  strategy  for  player  1  can  be 
represented  as  a  function  a:  Si  — >  S.  We  denote  by  SM  the  set  of  memoryless 
strategies  for  player  1,  and  by  £PM  the  set  of  pure  memoryless  strategies;  that 
is,  £pm  =  £p  n£AI .  Analogously  we  define  the  corresponding  strategy  families 
IIP ,  nF,  I1PF ,  nM,  and  nPM  for  player  2. 

Given  a  finite-memory  strategy  er  £  UF ,  let  Ga  be  the  game  graph  obtained 
from  G  under  the  constraint  that  player  1  follows  the  strategy  a.  The  corre¬ 
sponding  definition  Gw  for  a  player-2  strategy  7r  £  YlF  is  analogous,  and  we 
write  Ga  7r  for  the  game  graph  obtained  from  G  if  both  players  follow  the  finite- 
memory  strategies  a  and  7 r,  respectively.  Observe  that  given  a  2  '^-player  game 
graph  G  and  a  memoryless  player-1  strategy  a,  the  result  Gg  is  a  player-2  MDP. 
Similarly,  for  a  player-1  MDP  G  and  a  memoryless  player-1  strategy  a,  the  re¬ 
sult  Ga  is  a  Markov  chain.  Hence,  if  G  is  a  2  Y2-player  game  graph  and  the  two 
players  follow  memoryless  strategies  a  and  7r,  the  result  Ga^  is  a  Markov  chain. 
These  observations  will  be  useful  in  the  analysis  of  2  i/Yplayer  games. 

Objectives.  An  objective  for  a  player  consists  of  an  ux-regular  set  of  winning 
plays  <P  C  fl  [17].  In  this  paper  we  study  zero-sum  games  [9,15],  where  the 
objectives  of  the  two  players  are  complementary;  that  is,  if  the  objective  of  one 
player  is  <P,  then  the  objective  of  the  other  player  is  fl  \  <P.  We  consider  ox- 
regular  objectives  specified  in  Rabin  or  Streett  normal  forms.  For  a  play  ox  = 
(so;  Si,  S2,  ■  ■ .),  let  Inf  (ox)  be  the  set  {  s  £  S  \  s  =  Sk  for  infinitely  many  k  >  0  } 
of  states  that  occur  infinitely  often  in  ox.  We  use  colors  to  define  objectives 
independent  of  game  graphs.  For  a  set  C  of  colors,  we  write  [•]:  C  — >  2s  for  a 
function  that  maps  each  color  to  a  set  of  states.  Inversely,  given  a  set  U  C  S  of 
states,  we  write  [£/]={  c  £  C  |  [c]  fl  U  ^  0  }  for  the  set  of  colors  that  occur 
in  U.  Note  that  a  state  can  have  multiple  colors. 

A  Rabin  objective  is  specified  as  a  set  P  =  {(ei,  /i), . . . ,  (ea,  fd)}  of  pairs 
of  colors  ei,  fi  £  C.  Intuitively,  the  Rabin  condition  P  requires  that  for  some 
\  <  i  <  d,  all  states  of  color  be  visited  finitely  often  and  some  state  of 
color  fi  be  visited  infinitely  often.  Let  [P]  =  {(Pi,  Pi), . . . ,  (Ed,  Fd)}  be  the 
corresponding  set  of  so-called  Rabin  pairs,  where  Ei  =  [e*]  and  P;  =  [/*]  for  all 
1  <  i  <  d.  Formally,  the  set  of  winning  plays  is  Rabin(P)  =  {  ax  £  17  |  31<i< 
d.  (Inf  (ax)  fl  Ei  =  0  A  Inf  (ax)  fl  Pj  ^  0)  }.  Without  loss  of  generality,  we  require 
that  ( Uie{  i  2  U  Fi))  =  S.  The  parity  (or  Rabin-chain )  objectives  are 

the  special  case  of  Rabin  objectives  such  that  Pi  C  Pi  C  P2  C  P2  . . .  C  Ed  C 
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Fj.  A  Streett  objective  is  again  specified  as  a  set  P  =  {(ei,  /i), . . . ,  (e^,  fd)} 
of  pairs  of  colors.  The  Streett  condition  P  requires  that  for  each  1  <  i  <  d, 
if  some  state  of  color  /,  is  visited  infinitely  often,  then  some  state  of  color 
be  visited  infinitely  often.  Formally,  the  set  of  winning  plays  is  Streett  (P)  = 
{w  £  fl  |  VI  <  i  <  cL.  (Inf(w)  fl  Ei  Y  0  V  Inf(w)  H  F,;  =  0)  },  for  the  set 
[P]  =  {(Pi,  Pi), . . . ,  (Ed,  Fj,)}  of  so-called  Streett  pairs.  Note  that  the  Rabin 
and  Streett  objectives  are  dual;  i.e. ,  the  complement  of  a  Rabin  objective  is 
a  Streett  objective,  and  vice  versa.  Moreover,  every  parity  objective  is  both  a 
Rabin  objective  and  a  Streett  objective. 

Sure  winning,  almost-sure  winning,  and  optimality.  Given  a  player-1 
objective  <P,  a  strategy  a  £  E  is  sure  winning  for  player  1  from  a  state  s  £ 
S  if  for  every  strategy  tt  £  II  for  player  2,  we  have  Outcome(s,  er,  7r)  C  F 
The  strategy  er  is  almost-sure  winning  for  player  1  from  the  state  s  for  the 
objective  &  if  for  every  player-2  strategy  n,  we  have  Pr^,7r(^)  =  1.  The  sure 
and  almost-sure  winning  strategies  for  player  2  are  defined  analogously.  Given 
an  objective  <P,  the  sure  winning  set  ((1)) sureiffi)  for  player  1  is  the  set  of  states 
from  which  player  1  has  a  sure  winning  strategy.  The  almost-sure  winning  set 
((1 ))  almost  (<&)  for  player  1  is  the  set  of  states  from  which  player  1  has  an  almost- 
sure  winning  strategy.  The  sure  winning  set  ((2))sure(f2  \  <1>)  and  the  almost-sure 
winning  set  ((2}}a;most(f2\^)  for  player  2  are  defined  analogously.  It  follows  from 
the  definitions  that  for  all  2  i/Yplayer  game  graphs  and  all  objectives  we  have 
((1  })swe(^)  Q  ((1 }) almost Computing  sure  and  almost-sure  winning  sets  and 
strategies  is  referred  to  as  the  qualitative  analysis  of  21/2-player  games  [7]. 

Given  w-regular  objectives  <P  C  fl  for  player  1  and  for  player  2,  we  define 
the  value  functions  ((1  ))Vai  and  ((2 ))vai  for  the  players  1  and  2,  respectively,  as 
the  following  functions  from  the  state  space  S  to  the  interval  [0, 1]  of  reals:  for  all 
states  seS,  let  «1  ))vai($)(s)  =  sup,^  infwen  Pr^77^)  and  «2 )}vai(f2\<P)(s)  = 
supTg/J  infCTgi;  Pr^,7r(l7  \  ^).  In  other  words,  the  value  ((l))„Q;(^)(s)  gives  the 
maximal  probability  with  which  player  1  can  achieve  her  objective  <P  from  state  s, 
and  analogously  for  player  2.  The  strategies  that  achieve  the  value  are  called 
optimal:  a  strategy  a  for  player  1  is  optimal  from  the  state  s  for  the  objective 
if  ((1}}„q;(^)(s)  =  inf Pi's  ,7r(^)-  The  optimal  strategies  for  player  2  are 
defined  analogously.  Computing  values  is  referred  to  as  the  quantitative  analysis 
of  2  Y2-player  games.  The  set  of  states  with  value  1  is  called  the  limit-sure  winning 
set  [7].  For  21/2-player  game  graphs  with  w-regular  objectives  the  almost-sure 
and  limit-sure  winning  sets  coincide  [3]. 

Let  C  £  {P,  M,  F,  PM,  PF}  and  consider  the  family  Sc  C  E  of  special 
strategies  for  player  1.  We  say  that  the  family  Sc  suffices  with  respect  to  a 
player-1  objective  <P  on  a  class  Q  of  game  graphs  for  sure  winning  if  for  every 
game  graph  G  £  Q  and  state  s  £  ((!)) sure(4>),  there  is  a  player-1  strategy  a  £  Sc 
such  that  for  every  player-2  strategy  n  £  77,  we  have  Outcome(s,  er,  tt)  C 
Similarly,  the  family  Fc  suffices  with  respect  to  the  objective  ^  on  the  class 
Q  of  game  graphs  for  almost-sure  winning  if  for  every  game  graph  G  £  Q  and 
state  s  £  ((1 )) almost^) ,  there  is  a  player-1  strategy  a  £  Sc  such  that  for  every 
player-2  strategy  tt  £  Ft,  we  have  PiY,7r(^)  =  1;  and  for  optimality,  if  for  every 
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game  graph  G  €  Q  and  state  s  €  S,  there  is  a  player-1  strategy  er  €  Sc  such 
that  ((1  ))vai($)(s)  =  inf^nPrJ’7^). 

For  sure  winning,  the  1  i/^-player  and  2  i/^-player  games  coincide  with  2-player 
(deterministic)  games  where  the  random  player  (who  chooses  the  successor  at  the 
probabilistic  states)  is  interpreted  as  an  adversary,  i.e. ,  as  player  2.  Theorem  1 
and  Theorem  2  state  the  classical  determinacy  results  for  2-player  and  2  i^-player 
game  graphs  with  w-regular  objectives. 

Theorem  1  (Qualitative  determinacy  [8, 10]).  For  all  2-player  game  graphs 
and  Rabin  or  Streett  objectives  <F,  we  have  ((1  }}Sure(&)  FI  (( 2})sure(S2  \  <P)  =  0 
and  ((1  ))sure(&)  U  ((2))sure(f2  \<F)  =  S.  Moreover,  on  2-player  game  graphs,  the 
family  of  pure  memoryless  strategies  suffices  for  sure  winning  with  respect  to 
Rabin  objectives,  and  the  family  of  pure  finite-memory  strategies  suffices  for 
sure  winning  with  respect  to  Streett  objectives. 

Theorem  2  (Quantitative  determinacy  [12]).  For  all  21/2-player  game 
graphs,  all  Rabin  or  Streett  objectives  <P,  and  all  states  s,  we  have  ((1  ))vai(<F)(s)  + 
((2))val(Q\$)(S)  =  l. 


3  Qualitative  Analysis 

We  show  that  the  pure  memoryless  strategies  suffice  for  almost-sure  winning  with 
respect  to  Rabin  objectives  on  2  ^-player  game  graphs.  The  result  is  achieved 
by  a  reduction  to  2-player  Rabin  games.  The  reduction  also  allows  us  to  apply 
algorithms  for  solving  2-player  Rabin  games  to  the  qualitative  analysis  of  2 1/2- 
player  Rabin  games.  Furthermore,  in  the  next  section,  we  will  use  the  existence 
of  pure  memoryless  almost-sure  winning  strategies  to  prove  the  existence  of  pure 
nrenroryless  optimal  strategies. 

End  components  of  MDPs.  We  review  some  facts  about  end  components  [6] 
which  are  needed  for  the  further  development  of  the  paper.  We  consider  player-1 
MDPs  and  hence  strategies  for  player  1.  Let  G  =  ((S,  E),  (Si,  S2,  Sq),S)  with 
S2  =  0  be  a  1 1/2-player  game  graph. 

Definition  1  (End  components).  A  set  U  C  S  of  states  is  an  end  component 
if  U  is  6-closed  and  the  subgame  graph  G  \  U  is  strongly  connected. 

We  denote  by  £  C  2s  the  set  of  all  end-components  of  G.  The  next  lemma 
states  that,  under  every  strategy  (memoryless  or  not),  with  probability  1  the 
set  of  states  visited  infinitely  often  along  a  play  is  an  end  component.  This 
lemma  allows  us  to  derive  conclusions  on  the  (infinite)  set  of  plays  in  an  MDP 
by  analyzing  the  (finite)  set  of  end  components  in  the  MDP.  In  particular,  the 
lemma  implies  that  to  show  that  a  set  {(Ei,Fi), . . . ,  (Ed,  Fd)}  of  Rabin  pairs 
is  satisfied  with  probability  1,  it  suffices  to  show  that  for  each  reachable  end 
component  U,  there  exists  an  1  <  i  <  d  such  that  U  C\Ei  =  0  and  t/flFj  0.  To 
state  the  lemma,  for  s  €  S  and  U  C  S,  we  define  =  { to  G  f?s  |  Inf(w)  =  U  }. 

Lemma  1.  [6]  For  all  states  s  €  S  and  strategies  a  £  E,  Pr^  (U[/e£  ~  -*-• 
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Fig.  1.  Gadget  for  the  reduction  of  21/2-player  Rabin  games  to  2-player  Rabin  games. 


Reduction.  Given  a  21/2-player  game  graph  G  =  (( S,E ),  (Si,  S2,  Sq),  S),  a  set 
C  =  {ei,  /1, . . . ,  ed,  fd}  of  colors,  and  a  color  map  [•]:  S  — >  2C  \  0,  we  construct 
a  2-player  game  graph  G  =  ((S,  E),  (Si,  S2),  6)  together  with  a  color  map  [•]: 
S  — >  2C  \  0  for  the  extended  color  set  C  =  C  U  [ed+i,  f,j+i}-  The  construction 
is  specified  as  follows.  For  every  nonprobabilistic  state  s  £  Si  U  S2,  there  is  a 
corresponding  state  s  £  S  such  that  (1)  s  £  Si  iff  s  £  Si,  and  (2)  [s]  =  [s], 
and  (3)  (s,  t)  £  E  iff  (s,  t)  £  E.  Every  probabilistic  state  s  £  Sq  is  replaced  by 
the  gadget  shown  in  Figure  1.  In  the  figure,  diamond-shaped  states  are  player-2 
states  (in  S 2),  and  square-shaped  states  are  player-1  states  (in  Si).  From  the 
state  s  with  [s]  =  [s],  the  players  play  the  following  3-step  game  in  G.  First, 
in  state  s  player  2  chooses  a  successor  (s,  2k),  for  k  £  {0, 1, . . . ,  d}.  For  every 
state  (s,  2k),  we  have  [(s,  2k)}  =  [s].  For  k  >  1,  in  state  (s,2k)  player  1  chooses 
from  two  successors:  state  (s,  2k  —  1)  with  [(s,  2k  —  1)]  =  e*,,  or  state  (s,  2k)  with 
[(s',  2k)}  =  fk-  The  state  (s,  0)  has  only  one  successor  (s,  0),  with  [(s,  0)]  =  fd+i- 
Note  that  no  state  in  S  is  labeled  by  the  new  color  e^+i,  that  is,  [e^+i]  =  0. 
Finally,  in  each  state  (s',  j)  the  choice  is  between  all  states  t  such  that  (s,  t)  £  E, 
and  it  belongs  to  player  1  if  k  is  odd,  and  to  player  2  if  k  is  even. 

We  consider  the  21/2-player  game  played  on  the  graph  G  with  the  Rabin 
condition  P  =  {(ei,  /1), . . . ,  (ed,  fd)}  for  player  1.  Let  U 1  and  U 2  be  the  sure 
winning  sets  for  players  1  and  2,  respectively,  in  the  constructed  2-player  game 
graph  G  with  the  modified  Rabin  condition  P  =  { (ei ,  /1), . . . ,  (e<j+i,  fd+i)}  for 
player  1.  Define  the  sets  U\  and  U2  in  the  original  21/2-player  game  graph  G  by 
Pi  =  [s  £  S  |  s  e  Ui  }  and  P2  =  {s  €  S  |  s  6  P2  }■  From  the  determinacy 
of  2-player  Rabin  games  (Theorem  1),  it  follows  that  U 1  =  S  \  U 2,  and  hence 

t/i  =  S\U2. 


Lemma  2.  In  the  21/2-player  game  graph  G  with  the  Rabin  condition  P  for 
player  1,  there  exists  a  pure  memoryless  strategy  a  for  player  1  such  that  for  all 
player-2  strategies  7 r  and  all  states  s  £U\,  we  have  Prg,7r(Rabin(P))  =  1. 

Proof.  We  define  a  pure  memoryless  strategy  a  for  player  1  in  the  game  G  from 
a  strategy  a  in  the  game  G  as  follows:  for  all  states  s  £  Si,  if  ct(s)  =  t,  then 
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set  <t(s)  =  t.  Consider  a  pure  memoryless  sure  winning  strategy  a  in  the  game 
G  from  every  state  s  £  U\ .  Our  goal  is  to  establish  that  <7  is  an  almost-sure 
winning  strategy  from  every  state  in  U\. 

For  the  Rabin  objective  Rabin(P),  let  the  set  Rabin  pairs  be  [P]  = 
{  (Pi,  Pi),  (P2,  P2),  •  •  • ,  (Pd,  Pd)  }•  A  strongly  connected  component  (s.c.c.)  IT 
in  a  graph  G\  is  winning  for  player  1,  if  there  exists  i  £  {  1,2, . . .  ,d}  such  that 
IT  0  Fj  ^  0  and  IT  fl  Pi  =  0;  otherwise  W  is  winning  for  player  2.  If  Gi  is  a 
MDP,  then  an  end  component  IT  in  Gi  is  winning  for  player  1,  if  there  exists 
i  £  {  1, 2, . . . ,  d  }  such  that  W  fl  Pj  yf  0  and  IT  fl  P*  =  0;  otherwise  IT  is  winning 
for  player  2. 

We  prove  that  every  end  component  in  the  player-2  MDP  (G  [  Pi)CT  is 
winning  for  player  1.  It  would  follow  from  Lemma  1  that  a  is  an  almost-sure 
winning  strategy.  We  argue  that  if  there  is  an  end  component  IT  in  (G  [  U\)a 
that  is  winning  for  player  2,  then  we  can  construct  an  s.c.c.  in  the  subgraph 
(G  [  U i)^  that  is  winning  for  player  2,  which  is  impossible  because  a  is  a  sure 
winning  strategy  for  player  1  from  the  set  Pi  in  the  2-player  Rabin  game  G.  Let 
If’  be  an  end  component  in  (G  f  Ui)a  that  is  winning  for  player  2.  We  denote  by 
IT  the  set  of  states  in  the  gadget  of  states  in  IT.  Hence  for  all*  £  {  1, 2, . . . ,  d  } 
we  have  if  P;  HIT  yf  0,  then  ITnPj  yf  0.  Let  us  define  the  set  I  =  {  i\,  ii,  . . . ,  ij  } 
such  that  Eik  fl  IT  yf  0.  Thus  for  all  i  £  ({  1,  2, . . . ,  d  }  \  I)  we  have  Pi  fl  IT  =  0. 
Note  that  /  ^  0,  as  every  state  has  at  least  one  color.  We  now  construct  a 
sub-game  in  G^  as  follows: 

1.  For  a  state  s  G  IT  fl  S2  keep  all  the  edges  (s,t)  such  that  t  £  IT. 

2.  For  a  state  s  £  W  C\  Sq  the  sub-game  is  defined  as  follows: 

—  At  state  s  choose  the  edges  to  state  (s,  2 i)  such  that  i  £  I. 

—  For  a  state  s  £  W,  let  dis(s,  W  HP,;)  denote  the  shortest  distance  (BFS 
distance)  from  s  to  IT  fl  P,  in  the  graph  of  (G  (  IT)^.  At  state  (s,  2i), 
which  is  a  player  2  state,  player  2  chooses  a  successor  si  such  that 
dis(s\,  W  flPi)  <  dis(s,  IT  HPj)  (i.e.,  shorten  distance  to  the  set  IT  HP,; 
in  G). 

We  now  prove  that  every  terminal  s.c.c.  is  winning  for  player  2  in  the  subgame 
thus  constructed  in  (G  (  IT)^,  where  IT  is  the  set  of  states  in  the  gadget  of  states 
in  IT.  Consider  any  arbitrary  terminal  s.c.c.  Y  in  the  subgame  constructed  in 
(G  (  W)w  It  follows  from  the  construction  that  for  every  i  £  ({  1,  2, . . . ,  d  }  \  I), 
we  have  Pj  fl  Y  =  0.  Suppose  for  a  i  £  I  we  have  P*  fl  Y  ^  0,  we  show  that 
Pi  fl  Y  ^  0.  There  are  two  cases: 

1.  If  there  is  at  least  one  state  (s,2i)  such  that  the  strategy  a  chooses  the 
successor  (s,  2 i  —  1),  then  Pj  fl  Y  ^  0,  since  [(s,  2 i  —  1)]  =  ej. 

2.  Else  for  every  state  (s,  2 i)  the  strategy  for  player  1  chooses  the  successor 
(s,  2 i).  At  state  (s,  2 i),  which  is  a  player  2  state,  player  2  chooses  a  successor 
si  that  shortens  distance  to  the  set  TnPj.  Hence  the  terminal  s.c.c.  Y  must 
contain  a  state  s  such  that  [s]  =  e*.  Hence  Pj  fl  Y  yf  0. 

We  argue  that  for  every  probabilistic  state  s  £  Sq  fl  U\ ,  all  of  its  successors 
are  in  Pi.  Otherwise,  player  2  in  the  state  s  of  the  game  G  can  choose  the 
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successor  (3,0)  and  then  a  successor  to  its  winning  set  U 2,  which  contradicts 
the  assumption  that  the  strategy  a  is  a  sure  winning  strategy  for  player  1  in  the 
game  G  from  U\ .  It  follows  from  Lemma  1  that  for  all  strategies  7 r,  for  all  states 
s  £  Ui,  with  probability  1  the  set  of  states  visited  infinitely  often  along  the 
play  u)g,n  is  an  end  component  in  U\.  Since  every  end  component  in  ( G  f  U\)a 
is  winning  for  player  1  the  strategy  a  is  an  almost-sure  winning  strategy  for 
player  1  from  U\.  I 

Lemma  3.  In  the  21/2-player  game  graph  G  with  the  Rabin  condition  P  for 
player  1,  there  exists  a  finite-memory  strategy  it  for  player  2  such  that  for  all 
player-1  strategies  a  and  all  states  s  £  U2,  we  have  Prf’*  (f2  \  Rabin(P))  >  0. 

From  Lemma  2,  it  follows  that  U\  C  ((1))  almost  Rabin (P).  From  Lemma  3,  it 
follows  that  ((l))Q;mostRabin(P)  C  U\.  Therefore  Pi  =  ((l))aimostRabin(P).  The 
proof  of  Lemma  2  also  establishes  the  existence  of  pure  memoryless  almost-sure 
winning  strategies  for  Rabin  objectives. 

Theorem  3.  The  family  of  pure  memoryless  strategies  suffices  for  almost-sure 
winning  with  respect  to  Rabin  objectives  on  2 1/2-player  game  graphs. 

4  Quantitative  Analysis 

We  extend  sufficiency  results  for  families  of  strategies  from  almost-sure  winning 
to  optimality  with  respect  to  all  w-regular  objectives.  In  the  following,  we  fix  a 
2Y2-player  game  graph  G.  Given  an  w-regular  objective  T>,  for  every  real  r  £  IR 
the  value  class  with  value  r  is  VC(r)  =  {s  £  S  \  {{l))vai($)(s)  =  r}.  Proposition  1 
states  that  there  exist  optimal  strategies  for  player  1  such  that  they  never  choose 
an  edge  to  a  lower  value  class. 

Proposition  1.  For  all  u-regular  objectives  T>,  there  exists  an  optimal  strategy 
a  for  player  1  such  that  for  all  w  £  S* ,  s  £  Si,  and  t  £  S,  if  ((1  ))Vai{&)(t)  < 
((1  ))val(&)(s),  then  cr(w  ■  s)(t)  =  0. 

Definition  2  (Boundary  probabilistic  states).  Given  an  u-regular  objec¬ 
tive  d>,  a  probabilistic  state  s  £  Sq  is  a  boundary  probabilistic  state  if  there 
exists  a  successor  t  £  E(s)  such  that  ((1  }}vai($)(t)  ^  ((l))vai($)(s).  Observe 
that  for  every  boundary  probabilistic  state  s,  there  exist  ti,t2  €  E(s)  such  that 

{{l))val(&)(tl)  <  ((1  })val(&)(s)  and  ((l))„oJ(#)(f2)  >  ((1  )}val($)(s). 

Lemma  4.  Consider  a  2 1/2 -player  game  G  with  an  u-regular  objective  <P.  Given 
a  value  class  VC(r)  with  0  <  r  <  1,  let  B(r)  be  the  set  of  boundary  probabilistic 
states  in  the  value  class  VC(r).  Convert  each  state  in  B(r)  into  a  sink  state  that 
is  winning  for  player  1.  Let  the  new  game  be  G' .  Then  player  1  wins  almost-surely 
from  all  states  in  the  subgame  with  game  graph  G'  \  VC(r)  and  objective  T>. 

Proof.  Assume  that  player  1  does  not  win  almost-surely  from  every  state  in 
G'  \  VC(r).  Then  there  exists  a  state  where  player  2  wins  with  positive  bounded 
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probability.  It  follows  from  Corollary  1  of  [7]  that  there  exist  a  non-empty  set  U  C 
VC(r)  such  that  that  player  2  wins  almost-surely  from  U  in  G'  \  VC (r).  Consider 
an  optimal  strategy  a  that  never  chooses  an  edge  with  positive  probability  to  a 
lower  value  class  (such  a  strategy  exists  from  Proposition  1).  Since  player  2  wins 
almost-surely  from  U  it  follows  that  for  every  state  s  £  UnSi,  for  every  successor 
t  of  s  in  VC(r)  we  have  t  £  U.  It  follows  that  every  move  of  the  strategy  a  exists 
in  U .  Hence  player  2  wins  almost-surely  from  U  against  <r.  This  is  a  contradiction 
to  the  assumption  that  r  >  0  and  that  a  is  an  optimal  strategy.  I 

Definition  3  (Qualitatively  optimal  strategies).  A  strategy  a  is  qualita¬ 
tively  optimal  for  player  1,  for  an  co-regular  objective  <&,  if  the  following  con¬ 
ditions  hold:  (a)  for  every  state  s  £  ((1 ))  almost  (&),  the  strategy  a  is  almost-sure 
winning,  and  (b)  for  every  state  s  £  VC  (r)  such  that  0  <  r  <  1,  there  is  a 
constant  c  >  0  such  that  infw  gjrPrf'OP)  >c. 

Lemma  4  shows  that  in  every  value  class,  if  the  boundary  probabilistic  states 
are  assumed  to  be  winning  for  player  1,  then  player  1  wins  almost-surely.  We  call 
such  an  almost-sure  winning  strategy  a  conditional  almost-sure  winning  strategy. 
We  compose  conditional  almost-sure  winning  strategies  in  value  classes  to  obtain 
an  optimal  strategy.  If  a  strategy  a  is  conditional  almost-sure  winning,  it  follows 
that  for  all  player-2  strategies  7r  that  are  optimal  against  a,  the  play  uif ,7r  reaches 
the  boundary  probabilistic  states  with  positive  probability,  for  s  £  VC(r)  and 
r  >  0.  From  every  boundary  probabilistic  state  the  game  proceeds  to  a  higher 
value  class  with  positive  probability.  An  induction  on  the  number  of  value  classes 
yields  Lemma  5. 

Lemma  5.  For  every  u-regular  objective  <F,  if  a  player-1  strategy  a  is  almost- 
sure  winning  from  every  state  s  £  ((1 ))  almost  (d>) ,  and  is  conditionally  almost-sure 
winning  from  every  state  s  Y  ((2))  almost  \  I>),  then  a  is  qualitatively  optimal 
for  <F. 

Definition  4  (Locally  optimal  strategies).  A  strategy  a  is  locally  optimal 
for  player  1,  for  an  u-regular  objective  <P,  if  for  all  w  £  S* ,  s  £  S±,  and  t  £  S, 
if  ((1  ))vai($){t)  <  ((1  ))vai{@)(s),  then  a(w  ■  s)(t)  =  0. 

Note  that  by  definition,  a  conditional  almost-sure  winning  strategy  is  locally 
optimal.  The  following  Lemma  generalizes  Lemma  5.3  of  [4],  Theorem  4  follows 
from  Lemma  6.  Since  pure  memoryless  strategies  suffice  for  almost-sure  win¬ 
ning  with  respect  to  Rabin  objectives  on  2Y2-player  game  graphs  (Theorem  3), 
Theorem  5  is  immediate  from  Theorem  4. 

Lemma  6.  Consider  a  2l/2-player  game  G  with  an  u-regular  objective  <P  for 
player  1.  Let  a  be  a  finite-memory  strategy  such  that  a  is  both  qualitatively 
optimal  and  locally  optimal  for  <L.  Then  a  is  an  optimal  strategy  for  T>  from  all 
states  of  G. 

Theorem  4.  If  a  family  Sc  of  strategies  suffices  for  almost-sure  winning  with 
respect  to  an  w-regular  objective  <1  on  21/2-player  game  graphs,  then  Sc  suffices 
for  optimality  with  respect  to  <1  on  21/2-player  game  graphs. 
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Theorem  5.  The  family  of  pure  memoryless  strategy  suffices  for  optimality  with 
respect  to  Rabin  objectives  on  2  V2 -player  game  graphs. 

The  existence  of  pure  memoryless  optimal  strategies  for  2 1/2-player  game 
graphs  with  Rabin  objectives,  and  of  polynomial-time  algorithms  for  computing 
the  values  of  MDPs  with  Streett  objectives  [2] ,  establishes  that  the  2 1/2-player 
games  with  Rabin  objectives  can  be  decided  (qualitatively  and  quantitatively) 
in  NP.  The  NP-liardness  follows  from  the  hardness  of  2-player  Rabin  games. 

Theorem  6.  Given  a  21/2-player  game  graph  G,  an  objective  P  for  player  1, 
a  state  s  of  G,  and  a  rational  r,  the  complexity  of  determining  whether 
((1  ))vai($)(s)  >  r  is  as  follows:  NP-complete  if  P  is  a  Rabin  objective;  coNP- 
complete  if  P  is  a  Streett  objective;  and  in  NP  n  coNP  if  P  is  a  parity  objective. 
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